arXiv:1507.05073v2 [stat.CO] 4 Mar 2017 


Vol. 0 (0000) 

DOI: 10.1214/154957804100000000 


Sequential Quantiles via Hermite Series 
Density Estimation 

Michael Stephanou 

Department of Statistical Sciences^ University of Cape Town, Cape Town, South Africa 
e-mail: michael. stephanouOgmail. com 


Melvin Varughese 

Department of Statistical Sciences, University of Cape Town, Cape Town, South Africa, 
IBM Research, Johannehurg, South Africa 
e-mail: melvin. varugheseOgmail. com 

and 

Iain Macdonald 

Department of Actuarial science. University of Cape Town, Cape Town, South Africa, 
e-mail: iain. macdonaldOuct. ac. za 

Abstract: Sequential quantile estimation refers to incorporating observa¬ 
tions into quantile estimates in an incremental fashion thus furnishing an 
online estimate of one or more quantiles at any given point in time. Se¬ 
quential quantile estimation is also known as online quantile estimation. 

This area is relevant to the analysis of data streams and to the one-pass 
analysis of massive data sets. Applications include network traffic and la¬ 
tency analysis, real time fraud detection and high frequency trading. We 
introduce new techniques for online quantile estimation based on Hermite 
series estimators in the settings of static quantile estimation and dynamic 
quantile estimation. In the static quantile estimation setting we apply the 
existing Gauss-Hermite expansion in a novel manner. In particular, we ex¬ 
ploit the fact that Gauss-Hermite coefficients can be updated in a sequential 
manner. To treat dynamic quantile estimation we introduce a novel expan¬ 
sion with an exponentially weighted estimator for the Gauss-Hermite coeffi¬ 
cients which we term the Exponentially Weighted Gauss-Hermite (EWGH) 
expansion. These algorithms go beyond existing sequential quantile esti¬ 
mation algorithms in that they allow arbitrary quantiles (as opposed to 
pre-specified quantiles) to be estimated at any point in time. In doing so 
we provide a solution to online distribution function and online quantile 
function estimation on data streams. In particular we derive an analytical 
expression for the CDF and prove consistency results for the CDF under 
certain conditions. In addition we analyse the associated quantile estimator. 
Simulation studies and tests on real data reveal the Gauss-Hermite based 
algorithms to be competitive with a leading existing algorithm. 

Keywords and phrases: Sequential quantile estimation, online quantile 
estimation, sequential distribution function estimation, online distribution 
function estimation. 
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1. Introduction 

Algorithms for elucidating the statistical properties of streams of data in real 
time and for the efficient one-pass analysis of massive data sets are becoming 
increasingly pertinent. These data are being generated by a number of sources 
including the global financial markets, internet applications, sensors embedded 
in various devices and data-intensive scientific research endeavours such as the 
Large Hadron Collider and the Square Kilometre Array (see [5] for a survey 
of the field of Big Data). Ideally such algorithms should be able to process 
observations sequentially, without requiring the storage of all observations. In 
addition, the time taken to process each observation should not grow with the 
number of previous observations. Certain statistical properties naturally lend 
themselves to efficient, sequential calculation such as the mean and standard 
deviation. Depending on the application, these moments may not be sufficient 
however. This may be true for skewed data for example. In addition, if the data 
stream being analysed is prone to outliers then more robust statistics may be 
required. Quantiles are a natural choice in these settings. Examples of areas 
in which sequential quantile estimation is relevant include network traffic and 
latency analysis [3], real time fraud detection [2] and high frequency trading 
(see [17] for an introduction to sequential algorithms in high frequency trad¬ 
ing). Other conceivable applications of sequential quantile estimation include 
real time detection of anomalies and flagging of noteworthy observations, real 
time outlier detection and removal, real time provisioning for future demand 
and load balancing as well as real time risk estimation. 

In many applications of interest one needs to determine whether a particular 
value or observation is greater than or less than a certain quantile. In such cases 
it is more direct to use the cumulative distribution function. Thus, closely linked 
to quantile estimation is distribution function estimation. In this article we pro¬ 
pose new distribution function and quantile estimators based on Hermite series 
expansions and study their properties. These results are novel and interesting 
in their own right. That said, the particular setting we consider is that of online 
estimation and thus existing non-parametric methods are weighed against our 
methods in this specific context. In the general context, there are of course a 
number of well established non-parametric distribution function and quantile 
estimators. The most obvious of these being the sample cumulative distribution 
function also known as the empirical distribution function. Let ^ f{x) be 
i.i.d. random variables drawn from f{x) with cumulative distribution function 


F{x). 


The empirical distribution function (EDF) is defined as: 



i=l 


This estimator is consistent in the sense that it converges point-wise to the 
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true CDF. In fact, it converges uniformly over x (Glivenko-Cantelli theorem). In 
addition Fi (x) has an asymptotically normal distribution with the standard ^/n 
rate of convergence. The EDF estimator is unbiased and its mean squared error 
is MSE(Fi(x)) = e(x)(i-f(x)) ^ EDE estimator has however been shown 
to be asymptotically inferior (asymptotically deficient) compared to the kernel 
distribution function estimator with an appropriately chosen kernel type (in the 
MSE sense, see [24] for a precise definition of asymptotic deficiency). In fact the 
relative deficiency quickly tends to infinity as the sample size increases [24] . The 
kernel distribution function estimator is defined as: 


F2(x) = 



dx'. 


where the kernel function K{x) is a non-negative function that integrates to 
one and has mean zero and the bandwidth, > 0, is a smoothing parameter. 
F 2 {x) is asymptotically normally distributed. The almost sure uniform conver¬ 
gence of F 2 {x) has also been proved (see [24]). 


Closely related to the EDE, the sample quantile is a popular nonparametric 
estimator of the corresponding population quantile. Define the values 
X(i),..., to be a permutation of xi,..., x^ such that x^i) < X( 2 ) < • • • < 
X(;^p Here, x^q, is known as the ith order statistic. The EDE can be written in 
terms of the order statistics as: 


1 J A 

i=l 

The inverse cumulative distribution function or quantile function is defined 
as: 

q(p) = mf{x : F{x) > p}. (1) 

The p-th quantile can be estimated from the order statistics. In particular 
if p G (^, then q{p) = x^q, the ith order statistic. The sample quantile 
estimator is a function of at most two order statistics and thus may suffer a loss 
of efficiency for certain distributions. A natural way to improve efficiency is to 
form a weighted average of several order statistics under an appropriate weight 
function. Such estimators are called L-estimators. The most popular class of 
L-estimators uses a density function (kernel) as its weight function, these are 
known as kernel quantile estimators (see [26]). The kernel quantile estimator is 
defined as: 


n 


Tif 

^ rin 


s — p 


Hi) 


ds 


It has been shown that under suitable conditions on F{x) and hn the kernel 
quantile estimator is more efficient than the sample quantile estimator in the 
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MSE sense [10]. Under comparable assumptions to those to prove joint asymp¬ 
totic normality of a set of empirical quantiles, joint asymptotic normality of the 
kernel quantile estimator has also been proved [11]. The rate is also provided. 
In [26] the asymptotically optimal bandwidth for the kernel quantile estimator 
is derived and the corresponding MSE is provided. It is also shown that many 
different variants of the kernel quantile estimator are asymptotically equiva¬ 
lent in the MSE sense. In addition other L-estimators such as Harell-Davis, 
Kaigh-Lachenbruch and the Brewer estimators are shown to be asymptotically 
equivalent to the kernel quantile estimator above with a gaussian kernel and 
certain smoothing parameters. A more modern approach to estimating quan¬ 
tiles based on the Bernstein-Durrmeyer operator is provided in [20]. Like other 
L-estimators, the BD estimator constitutes a weighted version of several order 
statistics. MSE and MISE consistency results are also provided. 

In the context of sequential distribution function and quantile estimation, the 
aforementioned estimators have shortcomings. Both the EDE and kernel distri¬ 
bution function estimator only allow sequential estimation of the cumulative 
probability at a set of fixed x values (see chapters 4 and 5 of [13] and chapter 
7 of [8] for a discussion of recursive kernel estimators). Eor quantile estimation, 
both the sample quantile estimator and L-estimators such as the kernel quan¬ 
tile estimator and the Bernstein-Durrmeyer estimator require the storage and 
updating of one or more order statistics (a sorted sequence of all observations 
seen so far). Updating the order statistics cannot in general be done in 0(1) 
time. Moreover, the addition of a new observation would in general change a 
number of order statistics. Einally, in the context of sequential quantile estima¬ 
tion on streaming data, non-stationarity cannot be naturally addressed since 
these estimators have no means of forgetting previous observations (other than 
windowing and resetting). 

Approaches specifically to sequential estimator of quantiles have been devel¬ 
oped. Sequential quantile estimation algorithms can be differentiated on whether 
they seek to maintain an online estimate of a single quantile or multiple quan¬ 
tiles. They can be further differentiated on whether they are meant to estimate 
static quantiles of a stream of data or dynamic quantiles of a stream of data. 
In the case of static quantile estimation, online quantile estimates pertain to 
all the data observed so far and quantiles of the stream being analysed are 
assumed to be fixed. In the dynamic case, quantile estimates pertain to the 
current behaviour of the process and it is assumed that quantiles may vary 
over time. A number of algorithms have been proposed for sequential quantile 
estimation in these settings. In [15] the algorithm was proposed which uti¬ 
lizes parabolic interpolation in order to estimate a particular quantile. In [22] 
and [23] this algorithm was extended to the simultaneous estimation of several 
quantiles. In [18] the algorithm was further extended to treat dynamic quan¬ 
tile estimation via exponentially weighted quantile estimators. Algorithms have 
also been proposed based on stochastic approximation [25], [28]. This approach 
was extended to dynamic quantile estimation in [4] by the introduction of Expo- 
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nentially Weighted Stochastic Approximation (EWSA). These existing methods 
have a major shortcoming however, online estimates can only be obtained for a 
particular, pre-selected set of quantiles (e.g. p = 0.5,0.9,0.99 etc.). 

In this article we propose new techniques based on Hermite series estimators 
to maintain an online estimate of the CDF and the full quantile function in 
both the static and dynamic settings and thus yield estimates of the cumula¬ 
tive probability at arbitrary x and estimates of arbitrary quantiles that can be 
updated in constant time (0(1) time). This is the primary advantage of our 
suggested approach. Even if our proposed techniques only have comparable ac¬ 
curacy with existing techniques, they would still be valuable. We demonstrate 
using simulated and real data, that our techniques may in fact be more accurate. 

We begin by reviewing some background related to Hermite polynomials 
and the Gauss-Hermite expansion in section 2. In section 3 we introduce a 
Gauss-Hermite based estimator for the cumulative distribution function and 
discuss a numerical means of obtaining arbitrary quantiles. We then set about 
applying this to sequential quantile estimation in the settings of static and dy¬ 
namic quantile estimation. In this article, we make our treatment of static and 
dynamic quantile estimation concrete by considering the cases of independent 
identically distributed (i.i.d.) data streams and non-identically distributed in¬ 
dependent data streams respectively. Observations are continuous random vari¬ 
ables that are revealed sequentially (i.e. one at a time). The basic algorithm for 
the static case is presented in section 4. We then proceed to treat the dynamic 
case by introducing an exponentially weighted moving average estimator for the 
Gauss-Hermite coefficients in section 5. In section 6 we investigate the quality 
of the Gauss-Hermite GDF and quantile estimators theoretically. We compare 
the proposed techniques to a leading existing algorithm for both simulated data 
(in section 7) and real data (in section 8). Finally, we conclude in section 9. The 
practical consideration of standardising the observations from the data stream 
being analysed is treated in appendix A. Useful MISE results for the exponen¬ 
tially weighted Gauss-Hermite expansion are derived in appendix B. 


2. Background 


2.1. Hermite Polynomials 


In this section we introduce the Hermite polynomials which will play a central 
role in the construction of orthogonal series estimators for probability density 
functions. The Hermite polynomials are a classical orthogonal polynomial se¬ 
quence. Following standard notation [27] we define the Hermite polynomials: 


Hkix) = i-lfe-' 


-P 

dxi- 


which are orthogonal under the weight function e ® i.e. 
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f 


e ^ Hk{x)Hi{x)dx = ^/7^2^k\6kl. 


The following explicit expression may also be utilised: 


L/C/2J 


Hkix)=k\ 


(-ir 


k\{k — 2m)! 


{2x) 


k — 2m 


( 2 ) 


( 3 ) 


m=0 


Finally, some useful inequalities for Hp{x) are as follows [27] (used in [12] for 
example): 


|iJfe{a;))| e ^ <Ca{k + l) ^^‘^,\x\ 
for some constant Ca and non-negative a. 


(2'‘fc!v^) x ^^^Hk{x)) e 2 <da{k + l) 


<a (4) 

\x\ > a (5) 


for some constant da and positive a. 


In the next section we establish the link between expansions in Hermite poly¬ 
nomials and nonparametric density estimation. 


2.2. Gauss-Hermite Expansion 

A number of probability distribution expansions have been defined in terms 
of the Hermite polynomials. These include the Gram Char her A series, the 
Edgeworth series and the Gauss-Hermite expansion. These expansions have been 
applied to successfully fit astrophysical data that are nearly Gaussian but with 
small, meaningful deviations for example [1]. In this research we focus on one 
expansion, termed the Gauss-Hermite expansion following the terminology of 
[1]. This expansion has good convergence properties in practice and is robust to 
outliers [21]. We define this expansion below: 

Uf{x)eL2 x)dx < oc then f{x) can be expressed as follows: 

oo 

f{^) = '^0,kHk{x)Z{x), ( 6 ) 

k=0 

where Z{x) = is the standard normal probability density function 

and 


Ctk = OLk 


f 


Z{x)f{x)Hk{x)dx, 


( 7 ) 


where ap = In what follows, we refer to (6) along with the coefficients 

(7) as the Gauss-Hermite expansion. The fact that f{x) can be expanded in this 
manner is a consequence of the fact that the normalised Hermite functions: 


hk={2^k\^) ~^e-^Hk{x), 
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are an orthonormal basis for L 2 (the space of square integrable functions). 
The Gauss-Hermite expansion is in fact entirely equivalent to the expansion: 

00 

fix) = dkhk{x), (8) 

k=0 

nr 

— \ ^k' 

V (^k 

which is the expansion used to define what is termed the Hermite series es¬ 
timator in [29], [12]. We therefore use the descriptions interchangeably. The 
usefulness of the Gauss-Hermite form of the expansion (8) is that it makes ex¬ 
plicit the role of the normal distribution. Indeed it is explicit that one would 
expect near Gaussian distributions to be well represented with just a few coef¬ 
ficients. 

The + 1 term truncated Gauss-Hermite expansion is defined as: 

N 

fNix) = '^akHu{x)Z{x). (9) 

/c=0 

It is noteworthy that the coefficients (7) are such that the L 2 distance be¬ 
tween f{x) and fN{x) is minimised i.e. no other choice of ap would lead to a 
better approximation of f{x) in the L 2 distance sense. This follows from the 
fact that f{x) G L 2 and the fact that the first N I Hermite functions consti¬ 
tute an orthonormal basis for a A/" + 1 dimensional subspace of L 2 . See [7] for 
a succinct proof using these facts. At this point, f{x) is an arbitrary function 
in 1/2, it need not be a probability density function. For the purposes of den¬ 
sity estimation however we will regard f{x) as a probability density function (a 
non-negative function that integrates to one) that is also in L 2 . 


2.2.1. Truncated Gauss-Hermite Expansions and Nonparametric density 
estimation 

In the context of nonparametric density estimation, a natural estimator for the 
Ok coefficients is (following from equation (7) and the law of large numbers): 

n 

afe = — V Z(xj)iJfe(xj), Xj ~ f{x), (10) 

n 

i.e. the x^’s are observations from the probability distribution that we wish 
to estimate non-parametrically. 

The N 1 term truncated Gauss-Hermite expansion with estimated coeffi¬ 
cients is thus: 
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N 

Dix) = '^akHk{x)Z{x). ( 11 ) 

k=0 

Note that this is a biased estimator of f{x). One common measure of the qual¬ 
ity of the estimate of an unknown probability distribution is the mean integrated 
squared error (MISE). Let the true probability density function f{x) G L 2 . The 
mean integrated squared error associated with the truncated Gauss-Hermite 
expansion of f{x) is: 


E 



f{x))^dx = E 


- N 

E 


.k=0 






00 

+ E 

k=N+l 




a 


2 

k 


where we have made use of Parseval’s identity, ~ p{x)dx 

and the definition of ap (7). 

The first term is associated with the error due to using estimates of the coef¬ 
ficients, dk instead of the true, unknown, coefficients ap. This is the integrated 
variance term. The second term of the MISE is associated with the error due 
to truncation. This is the integrated squared bias term. In [12] the MISE con¬ 
sistency was proved under certain conditions. 


A few further comments are in order. In practice, the Gauss-Hermite expan¬ 
sion provides a good fit to a wide variety of probability density functions [21]. 
Eor completeness however, the following shortcomings should be noted as they 
may be important depending on the application of the Gauss-Hermite estimate 
of the density. In principle, for the truncated series, the probability density 
function that results may be negative at certain values of x. Also, truncated 
Gauss-Hermite expansions should capture nearly Gaussian distributions well in 
a relatively small number of coefficients. However, for distributions that differ 
greatly from the Gaussian distribution, a large number of coefficients may be 
required for a satisfactory fit, even if convergence is guaranteed in principle. 


In the next section we define an estimator for the cumulative distribution 
function associated with f{x) using the Gauss-Hermite expansion and discuss 
numerical means of obtaining quantiles. 


3. Estimating Quantiles using the Gauss-Hermite Expansion 
3.1. Cumulative Distribution Function 

In this section we derive an analytical expression for a Gauss-Hermite based 
cumulative distribution function estimator. To the best of our knowledge, this 
is the first such analytical derivation of a cumulative distribution function es¬ 
timator based on the Gauss-Hermite expansion (i.e. based on Hermite series 
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estimators). We utilise this expression to numerically obtain quantiles. We have 
discussed a number of well-established results on smooth CDF estimators based 
on other nonparametric techniques in the introduction. 

Before we begin, we recall the definitions of the Gamma functions: 

r(a,x) is the upper incomplete Gamma function defined as: 

poo 

r(a,x) = / 

J X 

7 (a, x) is the lower incomplete Gamma function defined as: 

7(a, x) = f 
Jo 

and r(a) is the usual Gamma function defined as: 

poo 

r(a) = / 

Jo 

Now, utilising (3) and (11) a natural estimator for the cumulative distribu¬ 
tion function associated with f{x) can be obtained as follows: 

For X < 0: 

Fn 


(a;) = f fM{x')dx' 

J —OO 

N i-x 

= Hk{x')Z{x')dx' 

k=0 

N [k/2} (_^\lok-2l px 

= —= [ 

^ k.{k - J- 


k=0 1=0 

N lk/2\ 


f (^') 

J —OO 


k-2ie-.'V2dx' 


- 7 1 ( — 1 )^ 2 ^ ^ XZ—--n/ 7 k 1 N 


k=0 1=0 




2 2^2 
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For X > 0: 
Fn 


(x) = f fN{x')dx' 

J —OO 

/ OO nOO 

fN{x')dx'- fN{x')dx' 

-OO J X 

E r poo poo 

Y,ak / Hk(x')Z{x')dx'- Hk(x')Z{x')dx' 

7._n OO Jx 


Thus, 


/C = 0 L^-OO 

N \_kl2\ 


(-1)^2^ 


lok-2l 


poo 

/ {x'f-'^F-x’ 

J X 


/c=0 /=0 

coo 


/ OO 

{x') 

-OO 




”e ■“ /^dx' 


iV \_k/2\ 






fe=0 I 




22 ^ 2 X 


[(-I)"-"' +1] r(-; + ^ + ^) - r(-« + ^ y) 


AT L^/2J 






k=0 I 


^ l\{k- 2 l)\V^ 


22 ^ 2 X 


[(_i)^-2/] r(-; + ^ + i) + 7(-« + ^ y) 


Fn{x) = < 


Ef=o«fefc!E}:'o 

if X > 0, 

E^-O E1-0^^ (-i)-i+>02f-3i-ir(-;+|+iy) 


Lfc/2J (-l)‘2f-"'-^[[(-l)fe-"‘]r(-i + | + l)+7(-f+| + i4)] 

Z!(/c—2Z)!i/7r 


l\{k — 2l)\^/^ 


^ if X < 0. 


( 12 ) 

The expression (12) allows us to directly estimate the cumulative distribution 
function without the need to numerically integrate the estimated probability 
density function (11). 


An alternative estimator for the cumulative distribution function is as follows: 


i-E,^=o4fc!Eeo- 


Fn{x) = 


\_k/2\ (-1)^2^- 


T(-Z+§ + |,^) . 


Ef=o«fefc!E}:: 


Lfe/2J (-1) 


J=0 


l\(k—2l)\y/7r 

-3i-ir(-;+|+iy) 


if X > 0 


7 _L 3 fc 
— t + K: 2 “2~ 


l\(k — 2l)\^/7T 


if X < 0 


(13) 
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This expression is derived in the same manner as (12) except that the integral 
fN{x')dx' is replaced with unity. We have found that empirically this es¬ 
timator yields more accurate results. This may depend on the situation however. 


3,2, Inverse Cumulative Distribution Function 


The inverse cumulative distribution function or quantile function is defined in 
equation (1). We can utilise (12) (or (13)) along with a numerical root-finding 
algorithm to determine the value of the p-th quantile, Xp = q{p), 0 < p < 1. 
Newton’s method can be applied for example. In this case, the following equation 
is iteratively evaluated until sufficient accuracy is achieved: 


^(«+i) _ _ 


Fn{xp ^) -P 

/iv(4*^) 


(14) 


where fN{x) is given in (11) and Fn{x) is given in (12) (or (13)). Natu¬ 
rally, convergence behaviour will depend on the choice of initial value, x ^^, and 
the properties of fN^x). In principle convergence may be slow, or the method 
may not converge at all. If techniques such as Newton’s method and related 
methods prove unstable in a given setting, more robust numerical root-finding 
algorithms can be applied. The best choice of root-finding algorithm and opti¬ 
mal initial value selection are areas for future research. 


It is important to note that in many cases of interest, the quantile itself is 
not required but rather it is necessary to determine whether an observation is 
above or below a particular quantile (consider outlier detection for example). 
In this case, no root finding is required. One simply plugs the observation into 
the cumulative distribution function and determines whether the cumulative 
probability is less than or greater than p. 


4, Online Quantile Estimation: Static Quantiles 

The problem we treat in this section involves estimating an unknown cumulative 
distribution function (and associated inverse cumulative distribution function) 
from a stream of independent and identically-distributed (i.i.d.) continuous ran¬ 
dom variable data. The Gauss-Hermite expansion furnishes an efficient means 
to achieve this. The primary reason for this is that the coefficients in the expan¬ 
sion can be updated with each new observation without recalculating the entire 
sum in (10). We can just incorporate a new term corresponding to the new ob¬ 
servation and maintain a running average for each coefficient. Moreover, we can 
simply plug in these updated coefficients into the analytical expression for the 
cumulative distribution function we have derived (12) (or (13)). Any quantile 
can then be obtained by a simple numerical root finding procedure (section 3.2). 
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The basic algorithm can be summarised as follows: 

1. Initialize + 1 coefficients as follows = a/cZ(xo)i^/c(xo), where xq is 
the first observation from the data stream and k = 0 ... N. 

2. For each new observation, x^, update .. .a^ as follows: 


(i - 1)4* + akZ{-Xi)Hk{-x.i) 


,k = 0,...,N. 


(15) 


3. Plug the updated coefficients ho,... uat into the expressions (11) and (12) 
(or (13)) to obtain updated estimates of the probability density function 
and cumulative distribution function respectively. 

4. Utilise a numerical root finding algorithm, along with the updated cu¬ 
mulative distribution function (and potentially the updated probability 
density function) to determine any arbitrary quantile. 


The above algorithm can also be applied to summarise the distribution of 
massive datasets in an efficient, one-pass manner which should be particularly 
useful when the size of the dataset is larger than the available memory. 


The computational cost of updating each of the coefficients (10) is manifestly 
constant (0(1)) and does not depend on the number of previous observations. 
Also, since the cumulative distribution function only depends on the coeffi¬ 
cients and has no explicit dependence on the observations, the time complexity 
of updating the cumulative distribution function following the arrival of a new 
observation is also 0(1). Similarly, the computational cost of the numerical root 
finding algorithm yielding any quantile of interest from the updated cumulative 
distribution does not depend on the number of previous observations. This is 
to be contrasted with deterministic approaches to obtaining quantiles such as 
efficient heap based median maintenance which has a time complexity 0(log i) 
at the i-th observation and has growing space requirements. 


While the updating procedure for the coefficients is fully sequential, it is clear 
that since we use a fixed and constant N the resultant Gauss-Hermite estimate 
of the probability density function is biased and thus, the resultant CDF and 
quantile estimates will in general be biased too. Thus our quantile estimator 
is sequential but biased. This bias does not prevent the estimator from being 
useful however. 


Data streams that have static quantiles are likely to be less prevalent than 
those that have dynamic quantiles (quantiles that vary over time). Indeed, many 
real-world data streams of interest exhibit non-stationarity. In section 5 we con¬ 
sider how the proposed algorithm can be modified to treat quantile estimation 
in the more realistic, dynamic setting. 
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4-1- Selection of N 

It is natural to assume that the quality of the CDF and quantile estimates is re¬ 
lated to the MISE of /Ar(x). Indeed, we demonstrate in section 6 that the MISE 
of fN{x) directly determines bounds on the MSE of the CDE and the MAE 
of the resultant quantile estimates for distributions with non-negative support 
under certain conditions. When viewed in the context of the MISE of /Ar(^), 
the choice of N controls the trade-off between the integrated variance and inte¬ 
grated squared bias of the estimate of f{x). Eor the Hermite series estimators, 
the integrated variance term vanishes as n ^ oo for fixed N (under certain con¬ 
ditions on /(x), see [12]). This leaves the contribution from the integrated bias 
term. The higher the value of the smaller the integrated bias. Thus in the 
setting of streaming data or one pass analysis of a massive data set, where we 
regard n ^ oo^ N would naively be made as large as possible to minimise the 
bias and hence the MISE (recall that the MISE controls the quality of the CDE 
and quantile estimates, see section 6.2). It is worth noting however that memory 
requirements and processing time increase with N since more coefficients have 
to be stored and updated. In addition, early quantile estimates could be poor for 
large N. If the intended application is sensitive to poor early quantile estimates, 
a small sample from the data stream or massive data set can be analysed in 
order to select N. While it is clear that in general the optimal N is different for 
the PDE, CDE and quantile estimates, our results suggest that it is a reasonable 
starting point to attempt to minimise the MISE of fN^x). Principled techniques 
exist to select the (MISE) optimal N such as the Kronmal-Tarter optimal stop¬ 
ping rule algorithm [16], [19]. This algorithm must be applied with care though 
as it may perform poorly if f{x) is multimodal or peaked as pointed out in [9]. 
The reader is referred to [9] and [14] for improvements to the algorithm. Data 
driven selection of N specific to the CDE and quantile estimates is an area of fu¬ 
ture research. In our simulation studies we demonstrate that above a minimum 
size of N, the effectiveness of the algorithm is in fact not critically dependent 
on the choice of N. Good results can be obtained for a range of values of N. 
Through extensive empirical studies we have determined that a value of N = 6 
yields good results for data with a unimodal distribution for example. 


5. Online Quantile Estimation: Dynamic Quantiles 

The problem we treat in this section involves obtaining a local estimate of an 
unknown cumulative distribution function (and associated inverse cumulative 
distribution function) from a stream of continuous random variable data with 
dynamic quantiles. In order to do this we replace the Gauss-Hermite coefficient 
estimator defined in (10) with an exponentially weighted moving average esti¬ 
mator for the coefficients. We will term the resulting expansion an exponentially 
weighted Gauss-Hermite expansion (EWGH expansion). The new estimator for 
the coefficients is given by: 
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= A[Q;feZ(xi)iJfe(xi)] + (l-A)a[* 

4°^ = [(^kZ{-x.o)Hk{-x.o)] ,k = 0,...,N, (16) 

where 0 < A < 1 controls the weight of new observations (and controls how 
rapidly the weightings of older observations decrease). This weighting scheme 
allows the local behaviour of the data stream to be tracked. The algorithm 
for obtaining arbitrary quantiles presented in the previous section is essentially 
unchanged except that we replace (15) with (16). To re-iterate, the updated 
coefficients can then be plugged into into the expressions (11) and (12) (or (13)) 
to obtain updated estimates of the probability density function and cumulative 
distribution function respectively. A numerical root finding procedure can again 
be applied to obtain arbitrary quantiles. 


5,1, Selection of the Parameters A and N 

For the choice of N, the same broad considerations apply as in section 4.1 i.e. 
the more complex the probability distribution of the data being analyzed, the 
higher the appropriate N to ensure a sufficiently low bias. In our simulation 
studies we have observed that a value of = 6 gives competitive performance 
for all choices in the set A = 0.01,0.05,0.1, for unimodal distributions. 

Given a choice of N, there are two factors to consider when selecting A. The 
first is how quickly the quantiles of the non-stationary data stream are expected 
to vary. We expect that the optimal A will be smaller for slowly varying quan¬ 
tiles and larger for rapidly varying quantiles. By selecting A, one is essentially 
selecting an effective window size of previously observed data to include in the 
quantile estimation. This follows from the fact that more recent data is weighted 
more heavily than older data. Consider the fraction of the weight included in 
the most recent r terms. 


r—1 

A^(l-A)^ = l-(l-A)^ 

j=0 

This is to be contrasted with the weight of the first (oldest) term: 

(1 - A)’- 


If we define the effective window size as that number of observations for which 
99.9% of the weight is contained in the most recent r terms or equivalently that 
the remaining 0.1% of the weight is associated with the first (oldest) term, then 
we see that the effective window size is: 

log 0.001 
log (1 - A) 
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We include below a tabulation of some commonly used values of A in EWMA 
applications and their associated effective window size in number of observa¬ 
tions. 


A 

Effective Window Size 

0.01 

687 

0.05 

135 

0.1 

66 

0.2 

31 


The ideal effective window size should be selected by judgement and domain 
specific knowledge of the data being analysed. For example, when analysing high 
frequency forex return data, one may expect that the most recent observations 
are the most pertinent and only the previous few minutes of observations would 
be relevant to estimating the current, local behaviour of the process. The second 
factor to consider is that A cannot be too large. As we demonstrate in section 
6, the MISE of /Ar(^) determines the quality of both CDF and quantile esti¬ 
mates under certain conditions. Using the bound on the MISE that we derive 
in theorem 8 we see that we can achieve a small integrated variance term by 
ensuring: 




is sufficiently small. As a starting point, we have found through extensive 
empirical studies that the common choices of A = 0.01, 0.05, 0.1 yield good per¬ 
formance of the algorithm in a number of scenarios. Indeed, in our simulation 
studies we demonstrate that the EWGH algorithm provides good results over 
this range of values for A. 


6. Quality of CDF and Quantile estimates 

In this section we derive error bounds on the mean squared error of the Gauss- 
Hermite GDF estimator and the mean absolute error (MAE) of the quantile 
estimator for distributions with support on the positive half-real line, subject 
to some additional conditions. In particular, we demonstrate that these error 
bounds directly depend on the MISE of the associated probability density func¬ 
tion estimates. This greatly simplifies obtaining the aforementioned error bounds 
and allows us to examine the asymptotic behaviour of the Gauss-Hermite based 
GDF estimator and the associated quantile estimator. While these results do 
not directly apply to the sequential quantile estimation setting (since N and A 
are fixed in that case), they are novel and interesting in their own right. These 
asymptotic results provide general context and provide comfort that the be¬ 
haviour of estimators constructed from the Hermite series probability density 
estimators have sensible asymptotic properties. To obtain asymptotic results we 
utilise existing MISE consistency results along with the associated rates for the 
standard Gauss-Hermite expansion [12] as well as novel MISE results for the 
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exponentially weighted Gauss-Hermite expansion which we derive in appendix 
B. All the results derived below are easily extended to f{x) with support on a 
bounded interval, [a, 6] with F]\^{x) = fN{x')dx'{omitted for brevity). Our 
approach below is not directly generalisable to (—oc,oo) however and deriving 
error bounds in that case is an interesting problem for further study. 

6.1, Quality of the Cumulative Distribution Function Estimate 

In what follows we assume that f{x) is supported on [0, oc). 

For non-negative random variables we obtain the simpler estimator: 


Fn{x) = 


I 


fN{x')dx' 


N Vk/2\ 

= - 

/ c =0 1=0 


■j{-l + I + 5, V) 


l\{k — 2/)!y^ 


^ > 0. (17) 


The expression (17) differs from (12) since the domain of integration is dif¬ 
ferent. We consider the Mean Squared Error (MSE) criterion and an integrated 
weighted MSE criterion (inspired by the Cramer-von Mises criterion) as mea¬ 
sures of the quality of the estimated cumulative distribution function. The in¬ 
tegrated weighted MSE criterion is defined as follows: 


U) 


2 



Fn{x) - F{x) f{x)dx 


1 2 


Fn{x) - F{x) f{x)dx, 


(18) 


where we have made use of Eubini’s theorem which allows us to interchange 
the ordering of the integrals. Note that the PDE f{x) is the weighting factor in 
(18). 

Proposition 1. Suppose f{x) is supported on [0, 00) and f{x) G L2 then we 
have: 


for fixed x. 


E 


Fn{x) — F{x) 


<xMISE{M. 


Proof. Eor a non-negative random variable: 


Fn{x) - F{x) 



r\ 


Jo ^ 


dx' 
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By the Cauchy-Schwarz inequality: 

Fn{x) - F{x) ' < r Dix') - fix') ' dx'l \ r dx' 

L^o J L^o 


Thus: 


dx' 


Fn{x) - F{x) ^ <x\ r Uix') - fix') 

Uo 

Now, since ifN{x') — f{x'))^ is non-negative we have, 

nX poo 

/ ifNix') - fix')Ydx' < / if Nix') - fix'))^dx'. 

J 0 J —oo 


Thus: 


E 


pNix) - Fix) 


< xE 


/ oo 

Dix')-fix') 

-OO 


dx' 


= xMISE(/^). 


(19) 


This implies that if we have an upper bound for the MISE of fN{x) we can 
bound the MSE of Fj^ix). 

□ 

Proposition 2. Suppose /{x) is supported on [0, oo) and/{x) has a finite mean, 
pi < oo then we have: 


uF < MISEiMfi. 

Proof Utilizing proposition 1 we have. 


xfix)dx 

= MISE(/^)/i, (20) 

where /r is the mean of fix). 

□ 

We now consider the consistency and associated rate of convergence for the 
CDE estimator (17) defined from the standard Gauss-Hermite coefficients (10). 

Theorem 1. Suppose fix) is supported on [0, oo) and fix) G I/2. In addition 

suppose that ^ > 0 as A^(n),n ^ 00 and E\X\i < 00 then we have: 


< MISE(/Ar) [ 

Jo 


E 


pNix) - Fix) 


2 
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Proof. The result follows from proposition 1 and the fact that MISE(/ 7 v) —^ 0 
under the conditions ^ ^ 0 as N(n),n oo and E\X\i < oo [12]. □ 

Theorem 2. Suppose f{x) is supported on [0,oo); f{x) G L 2 , r >1 derivatives 

of f{x) exist and {x — f{x) ^ ^ 2 * Suppose in addition that ^ ^ 0 as 
N{n)^n oo and E\X\i < 00 then if: 

N{n) we have 


E 


Fn{x) — E{x) =xO(jl 2 r/( 2 r+l)^ 


Proof In [12] it is established that provided f{x) G I/ 2 , r > 1 derivatives of 

f{x) exist, {x — ^Yf{x) ^ L 2 , ^ ^ 0 as N{n),n oo and E\X\i < 00 
then if: 

N{n) ^ we have 

MISE(/iv) = O . 

Combining this with proposition 1 completes the proof. 

□ 

Note that for r = 1 the rate is 0(n“^/^). It is important to note that this 
rate is suboptimal compared to the smooth kernel CDF estimate rate which is 
0{n~Y' For smooth probability density functions (r ^ oo) satisfying the ap¬ 
propriate conditions, the rate for the Gauss-Hermite CDF estimator approaches 

0{n~Y‘ 

Theorem 3. Suppose f{x) is supported on [0, oc) and f{x) has a finite mean, 
/i < oc. In addition suppose that ^ ^ 0 as N{n),n oc then we have: 

0 . 

Proof. The result follows directly from proposition 2 and the fact that 
MISE(/Ar) ^ 0 under the conditions ^ ^ 0 as N{n),n oc and E\X\i < 
oc [12] (for positive random variables, we have E{X) < oc implying E\X\i < oc 
by the Lyapunov inequality). 

□ 

Theorem 4. Suppose f{x) is supported on [0,oc), f{x) has a finite mean, 
fi < oc, f{x) G 1/2; r > 1 derivatives of f{x) exist and {x — ^Yf{x) ^ L 2 . 

Suppose in addition that ^ 0 as N{n),n oc then if: 
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N{n) we have 


00-2 =0 (^n-‘^r/{2r+l)'^ _ 

Proof. This follows directly from proposition 2 and the MISE result referenced 
in the proof of theorem 2. 

□ 

Remark. An important class of distributions for which we can gain further 
insight into the theoretical performance of the Gauss-Hermite CDF estimator 
(and quantile estimator in principle) is power-law distributions. These are heavy¬ 
tailed distributions and are highly relevant in a number of fields including finance 
[6]. We utilise the following definition of the power law distribution: 


fix) = 


a — 1 


where a > 1 is a requirement for normalisability. In addition, we assume 
^min ^ Thus f{x) is supported on [0,oo), f{x) G L 2 and all derivatives of 
f{x) exist for x > ^j^iin* require in addition that a > 2, then we have 

a finite mean E{X) < 00 . The condition in the theorems proven above that 
{x — f{x) ^ L 2 can be related to a as follows: 


Denote the operator ^ as D. Now: 


{x - DYf{x) 


——^ (x^ + x''-^D + x'^-'^Dx + x^-‘^D‘^ + --- + D^) 
(^min 




since x and D are non-commutative operators. Thus 
[{x — DYf{x)]^ = This implies that for [{x — DYf{x)]^ to be 

integrable, we must have —2{a — r) < —1. Which implies r < a — ^ and 

thus r = \a — — 1. Thus if we also have ^ ^ 0 as N{n),n oc, 

theorem 1 and theorem 2 imply that the Gauss-Hermite CDF estimator is 

2 


consistent for power-law distributions and the rate is E 


En{x) — E{x) 


xO(^n 2 I 2 )/( 2 ra 1] i)j. Similarly theorem 3 and theorem 4 imply that 

^ 0 and the rate is = O ^ 72 -( 2 r«-il- 2 )/( 2 ra-^l-i)^ ^ Thus for power-law 

distributions that have a finite mean, the rates are O or better. For 

power-law distributions that have a finite variance, a > 3 and thus the rates 
are O or better. As a ^ oc (along with the number of finite moments 

of the power-law distribution), the Gauss-Hermite rates approach O 
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To summarise, we have derived the asymptotic results above in the setting 
where N depends on n, i.e. N = N{n). These results give comfort that the 
Gauss-Hermite based CDF estimator has sensible asymptotic behaviour. Our 
proposed online estimators have N fixed however. Thus results such as theo¬ 
rem 1 do not apply directly. Instead, as n ^ oo, these online estimators have 
MSE bounds determined by the integrated squared bias of the truncated Gauss- 
Hermite PDF estimators on which the plug-in CDF estimators are based. While 
this bias persists for fixed N even as n ^ oc, these estimators are nonetheless 
very useful in practice as we have demonstrated in our simulation and real data 
results. In addition, we can still gain insight into the behaviour of these estima¬ 
tors at fixed N. We now consider the behaviour of the EWGH CDF estimator 
defined from utilising the exponentially weighted Gauss-Hermite coefficients (16) 
in the expression (17) for the CDF where A is fixed, N is fixed and sufficiently 
large and n ^ oo. We begin with the case of i.i.d. data drawn from f{x). 

Theorem 5. Suppose f{x) is supported on [0,oo), f{x) has a finite mean, 
pi < oo, f{x) G 1/2; r > 1 derivatives of f{x) exist and {x — f{x) ^ ^ 2 - 
Then for N fixed and suffieiently large and n ^ oo. 


and 


E 


Fn{x) - F{x) 




2-A 


0{N-Y , 


= 0(ArV2) 


0{N-Y- 


Proof The result follows from proposition 1 and 2 respectively along with the¬ 
orem 8. □ 


We now treat the case of independent, non-identically distributed data. In 
particular, we consider the case of a change point where the distribution changes 
from fi{x) to / 2 (^)- We consider this a fundamental example of non-identically 
distributed data. 


Theorem 6. Suppose s + 1 observations are drawn from a probability dis¬ 
tribution fi G 1/2 followed by a further t observations from a seeond distri¬ 
bution /2 G 1/2 i.e. we assume an independent sequenee of s F 1 observa¬ 
tions xq, ... ,X 5 ^ fi{x) followed by an independent sequenee oft observations, 
Xs+i ,... ,Xt +5 ^ f2{x). Ifr> 1 derivatives of f2{x) exist and {x - f2{x) G 

L2 and both distributions have a finite mean, then: 


E 


-Fjv(a:) — F2{x) 




4(1 - A)2* +-- [i-(1-A)2(^+*)1+(1 - 


2-A L 


+ 0{N- 


and 
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uj^ = 0{N^/^) 4(1-A)2* + ^ [i-(1-A)2("+*)1+(1-A)2("+*) +0{N-^), 

2 — A L J 



Proof. The result follows from proposition 1 and 2 respectively along with the- 


□ 


orem 9. 


Note that asymptotically we can choose A(n) ^ 0, n ^ oo such that 
2 _ 


E F]sf{x) — F{x) 0 and ^ 0 for both the i.i.d. case and the non- 


identically distributed, independent (change point) case. We do not present the 
proof here for the sake of brevity. 

6.2. Quality of Quantile Estimate 

Theorem 7 . Suppose f{x) is supported on [0,oo). In addition we suppose that 
the true quantile Xp lies between [x'^'^^^ and that f{x) > d, d > 0 for 

X G [x'^^'^Finally we assume that we know x^'^^^x^^^^d allowing us to 
refine the Gauss-Hermite quantile estimate as follows: 


(F^\p) ifP^^ (p) exists in [x 



Xp = < and fN{x)Pd, xe , x^ 

^ undefined otherwise. 

For well defined Xp: 


( 21 ) 


E \xp - xp\ < 2F^MISE{fN{x)). 


Proof. In the proof of proposition 1 we established: 



— oo 


Thus: 



( 22 ) 


( 23 ) 


where Xp = q{p). Provided Xp = Fn^ (p) exists, this implies: 
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Fn{xp) — F]\[{xp) 



fN{c) 


< 


< 


< 



(/at( a;') - f{x')Ydx' 


{fN{x') - f{x')Pdx' 


{fN{x') - f{x')Pdx', 


(24) 

(25) 

(26) 


where c lies in the interval (xp, Xp) if Xp > Xp or (xp, Xp) if Xp < Xp. We have 
applied the mean value theorem and thus /Ar(c) is equal to the mean value of 
/at in the interval i.e. /jv(c) = fN{x)dx. 


Thus, provided fN{c) 7 ^ 0, we have: 




fN{c) 


ISE(/^), 


(27) 


To get a more concrete result we assume we utilize our refined quantile es¬ 
timator. This estimator is quite natural as it restricts the quantile estimates 
to those formed from bona-fide probability densities i.e. those fN{x) that are 
non-negative. Moreover, requiring fN{x) > 0 ensures that there is a unique 
solution for Xp = qN{p)- Finally, it allows us to reject quantile estimates that 
are out of bounds. For the estimates that are not undefined we have, via the 
Cauchy-Schwarz inequality: 


E\xp-xp\ < ^^MISE(/jv). (28) 

□ 

This result again demonstrates the direct link to the MISF of the probability 
density function and allows one to, in principle, establish asymptotic properties 
under certain conditions (similar to above results for the CDF). As a note to 
the practitioner, this refined quantile estimator can be viewed as one where 
we reject estimates that are out of bounds and those that are not constructed 
from bona-fide probability density estimates. We expect this to be an infrequent 
scenario unless N is too small and the probability density estimator is heavily 
biased or too few observations have been incorporated into the estimate. We 
base this expectation on extensive empirical analysis. 


7. Simulation Results 

In this section we evaluate the behaviour of the Gauss-Hermite (GH) online 
quantile estimation algorithm presented in section 4 and the Exponentially 
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Weighted Gauss Hermite (EWGH) algorithm presented in section 5 on simulated 
data. We also compare the performance of these algorithms to a leading exist¬ 
ing algorithm for online quantile estimation, namely Exponentially Weighted 
Stochastic Approximation (EWSA), which has been shown to be competitive 
with a number of other algorithms for online quantile estimation [4]. The EWSA 
algorithm is an exponentially weighted version of the stochastic approximation 
algorithm of [28]. EWSA has two parameters, one that controls the size of the 
batches of data used to update the quantile estimates (denoted M) and a weight¬ 
ing factor w that controls the weighting of updates to the density and quantile 
estimates in the stochastic approximation scheme (see [4] for a detailed descrip¬ 
tion of the algorithm). 

In our investigations both i.i.d and non-identically distributed simulated data 
are considered. In particular, i.i.d data from a chi-squared distribution with five 
degrees of freedom, xs, and an exponential distribution with mean and variance 
equal to one are considered. In the i.i.d. case, the data have static quantiles. 

In the non-identically distributed setting we consider simulated data drawn 
from distributions with non-stationary parameters. In particular we consider 
simulated data drawn from a normal distribution with variance one and a mean 
of .006j at update j to simulate data with a linear trend in the mean. We then 
consider an exponential distribution with mean and variance equal to 1 + 0.006j 
at update j to simulate data with a non-stationary mean and standard devia¬ 
tion. This is an interesting test in that the dispersion of the data increases as 
the number of observations grows. These non-identically distributed simulated 
data have dynamic quantiles that change over time. Both of these models were 
studied in the simulations of [4] . 

The choice of these test distributions can be motivated by the diversity of 
their properties and the frequent appearance of these distributions in statistical 
applications. Eor each distribution, three quantiles are estimated, namely the 0.5 
(median), 0.9 and 0.99 quantiles following [4]. Note that in the case of the Gauss- 
Hermite based algorithms arbitrary quantiles can be obtained at any point in 
time whereas algorithms such as the EWSA algorithm require the quantiles to 
be specified upfront. Online, arbitrary quantiles are not available for algorithms 
such as EWSA. The maximum number of observations, m = 4000 per run in the 
i.i.d. case and m = 1000 in the non-identically distributed case (corresponding 
to the maximum number of updates in [4]). There were 1000 runs in total for 
each distribution. We utilise the empirical root mean squared error (RMSE) 
to evaluate the performance of the online quantile estimation algorithms in 
estimating the quantile q after j observations (where we denote the quantile 
estimate qj). The RMSE at updating step j is defined by: 

RMSE(g,.)= (29) 

and is estimated by averaging the squared difference between qj and qj over 
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the 1000 simulation runs and then taking the square root. As a measure of the 
error in the RMSE estimate at updating step we construct a 95% percentile 
bootstrap confidence interval: 

(RMSE^O.025), RMSE^o.975)), 

where RMSE*q Q 25 ) and RMSE*q 975 ^ denote the 0.025 and 0.975 quantiles of 
the bootstrap estimates of the RMSE respectively (1000 bootstrap estimates 
were utilised in constructing the intervals for each j ). 

In [4] a range of values for the EWSA algorithm parameters M and w were 
investigated for performance. It was demonstrated that w = 0.05 gives the best 
trade-off between bias and long-run variability. In addition, the value of M = 15 
was shown to be superior in long-run performance to other values of M tested. 
Since we evaluate the same i.i.d. and non-independently distributed data stream 
models as [ 11 ] in our simulations (except for the addition of the chi-squared i.i.d. 
model which is qualitatively similar), we regard these parameters as principled 
choices for good performance of the EWSA algorithm in these settings. 

In our simulation studies for the GH and EWGH algorithms we demonstrate 
the effectiveness of the algorithms over a range of values for N and A and 
identify particularly good choices. Goncretely for the GH algorithm we con¬ 
sider N = 4, 6 , 8 ,10,12 at m = 100,400,4000 observations to evaluate the bias- 
variance trade-off at different numbers of observations. We also present RMSE 
curves where we compare the results to the EWSA algorithm. Eor the EWGH 
algorithm we consider A = 0.01,0.05,0.1 {N = 6) at m = 100,400,1000 obser¬ 
vations and present RMSE curves comparing to the GH (A^ = 6 ) and EWSA 
algorithms. 

Einally, in order to practically apply the Gauss-Hermite based algorithms 
more effectively, an online standardisation procedure was applied to the data as 
outlined in appendix A. This is not a pre-processing step (this would defeat the 
purpose of an online algorithm) but rather part of the online algorithm. Also, 
this procedure may not always be necessary, depending on the application. It is 
also noteworthy that we utilise the alternate CDE estimator (13) in estimating 
quantiles for the GH and EWGH algorithms as we have found this estimator to 
yield better results empirically. 

7.1. IID Data 

Eor the i.i.d. simulated data, we evaluate the performance of the GH algorithm 
for various choices of N and we use the EWGH algorithm (A = 0.05) and 
the EWSA algorithm (M = 15, re = 0.05) for comparison. The GH algorithm 
performs well for most choices of N, illustrating that the effectiveness of the 
algorithm is not critically dependent on this choice. That said, the value of 
N = 6 appears to be the best choice in most cases when viewed from a RMSE 
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perspective. In addition, A/" = 6 is a good choice from a computational speed and 
efficiency viewpoint since there are fewer coefficients to store and update than for 
higher choices of N. When compared to the EWSA algorithm, the GH algorithm 
performs better in most cases. The EWGH algorithm (with A = 0.05) has a 
larger error than the GH algorithm in all cases. This is not entirely surprising 
however. The EWGH algorithm trades extra variance in the estimates of the 
coefficients for the ability to track dynamic quantiles. The individual tests are 
discussed below. 

7.1.1. The Chi-Squared Distribution 

The chi-squared distribution appears frequently in statistics and is a useful 
test distribution in that it has support on the half real line [0, oo) and not 
the full real line. This distribution is used to simulate skewed data. This is 
a challenging test case for the Gauss-Hermite based algorithms since the chi- 
squared distribution is a considerable departure from the normal distribution 
which undergirds the Gauss-Hermite expansion. We consider the chi-squared 
distribution with five degrees of freedom in particular. See figure 1 for plots of the 
GH RMSE for N = 4,6, 8,10,12 at m = 100,400,4000 observations. The results 
for the EWGH and EWSA algorithms are also included for comparison. These 
figures illustrate that good results are achieved for all values of N considered for 
the GH algorithm. N = 6 appears to provide the best results. It is interesting to 
note that upon first inspection, it is counter-intuitive that the RMSE increases 
at higher values of N even when the number of observations is large. We suspect 
that this is due to the additional bias introduced by the online standardisation 
procedure as well as by using the CDE estimator (13) instead of (12). See figure 
3 for a comparison of the GH algorithm {N = 6) and the EWSA algorithm. Note 
that the EWSA results were excluded from figure l(i) and figure 3(c) since they 
were disproportionately large and would obscure the GH results when presented 
on a common scale. This may indicate instability in the EWSA algorithm for 
estimating tail quantiles such as p = 0.99. 

7.1.2. The Exponential Distribution 

The exponential distribution is another commonly occurring distribution. The 
distribution also has support on the half real line [0, oc) and is used to simulate 
skewed data. The exponential distribution is an even more challenging test case 
for the Gauss-Hermite based algorithms than the chi-squared distribution. This 
is due to the fact that the exponential distribution’s mode occurs at the start of 
its domain which is to be contrasted with the mode of the normal distribution 
which is equal to its median. We consider the exponential distribution with 
mean and variance equal to one. See figure 2 for plots of the GH RMSE for 
N = 4,6, 8,10,12 at m = 100,400,4000 observations. The results for the EWGH 
and EWSA algorithms are also included for comparison. These figures again 
illustrate that good results are achieved for all values of N considered and 


M. Stephanou et al./Sequential Quantiles via Hermite Series Density Estimation 25 


that the value of N = 6 appears to provide the best results. See figure 4 for 
a comparison of the GH algorithm {N = 6) and the EWSA algorithm. Note 
that the EWSA results were excluded from figure 2(i) and figure 4(c) since they 
were again disproportionately large and would obscure the GH results when 
presented on a common scale. 

7.2. Non-identically Distributed Data 

Eor the non-identically distributed simulated data, we evaluate the performance 
of the EWGH algorithm for various choices of A and we use the GH algorithm 
{N = 6) and the EWSA algorithm (M = 15, re = 0.05) for comparison. Moti¬ 
vated by the analysis of the GH algorithm in the i.i.d. setting we set A" = 6 for 
all tests of the EWGH algorithm. The EWGH algorithm compares favourably 
with the EWSA algorithm for all values of A, illustrating that the effectiveness 
of the algorithm is not critically dependent on this choice in the models we 
studied. The GH and EWGH algorithms outperform the EWSA algorithm in 
almost all cases. The dynamic quantile tracking ability of the EWGH algorithm 
is apparent in that it achieves better results than the GH algorithm when using 
an appropriate value of A. The individual tests are discussed below. 


7.2.1. Normal Distribution with Drift 

The normal distribution is ubiquitous. We consider simulated data drawn from 
a normal distribution with variance one and a mean of .006j at update j to 
simulate data with a linear trend in the mean. See figure 5 for plots of the 
EWGH RMSE for A = 0.01,0.05,0.1 at m = 100,400,1000 observations. The 
results for the GH and EWSA algorithms are also included for comparison. 
These figures illustrate that competitive results are achieved for all values of A 
considered for the EWGH algorithm. The value of A = 0.01 appears to provide 
the best results. See figure 7 for the RMSE curve for the EWGH algorithm with 
A = 6, A = 0.01 compared to the GH and EWGH algorithms. 


7.2.2. Exponential Distribution with Drift 

In this section we consider an exponential distribution with mean and variance 
equal to 1 + O.OOGj at update j to simulate data with a non-stationary mean 
and standard deviation. The dispersion of the data increases as the number 
of observations grows. See figure 6 for plots of the EWGH RMSE for A = 
0.01,0.05,0.1 at m = 100,400,1000 observations. The results for the GH and 
EWSA algorithms are also included for comparison. These figures illustrate 
that competitive results are again achieved for all values of A considered for the 
EWGH algorithm. The value of A = 0.01 appears to provide the best results. 
See figure 8 for the RMSE curve for the EWGH algorithm with A = 6, A = 0.01 
compared to the GH and EWGH algorithms. 


Fig 1: Chi-Squared Distribution: GH RMSE for N = 4, 6, 8,10,12 at m = 100,400,4000 observations for the p = 0.5, 0.9, 0.99 
quantiles (including 95% percentile bootstrap confidence intervals). The exact quantiles are 4.3515, 9.2364 and 15.0863 for 
comparison. 
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Fig 2: Exponential Distribution: GH RMSE for N = 4,6,8,10,12 at m = 100,400,4000 observations for the p = 0.5,0.9,0.99 
quantiles (including 95% percentile bootstrap confidence intervals). The exact quantiles are 0.6931, 2.3026 and 4.6052 for 
comparison. 
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Fig 4: RMSE curves associated with the exponential distribution for the 0.5, 0.9 and 0.99 quantiles (including 95% percentile 
bootstrap confidence intervals). The exact quantiles are 0.6931, 2.3026 and 4.6052 respectively. The Gauss-Hermite algorithm 
utilises N = 6. 
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Fig 5: Normal Distribution with Drift: EWGH RMSE for A = 0.01,0.05,0.1 at m = 100,400,1000 observations for the 
p = 0.5,0.9,0.99 quantiles (including 95% percentile bootstrap confidence intervals). The GH and EWGH algorithms utilise 
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Fig 6: Exponential Distribution with Drift: EWGH RMSE for A = 0.01,0.05,0.1 at m = 100,400,1000 observations for the 
p = 0.5,0.9,0.99 quantiles (including 95% percentile bootstrap confidence intervals). The GH and EWGH algorithms utilise 
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Fig 8: RMSE curves associated with the exponential distribution with mean and standard deviation = 1 + 0.006j for the 0.5, 
0.9 and 0.99 quantiles. The final exact quantiles at j = m = 1000 are 4.8562, 16.1319 and 32.2638 respectively. The GH and 
EWGH algorithms utilise N = 6. The EWGH algorithm utilises A = 0.01. 
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8. Real Data Results 

In this section we test the GH and EWGH algorithms on a real data set to 
evaluate their effectiveness in a non-idealised setting. In particular we consider 
one month (January 2013) of high frequency forex return data, namely EURUSD 
spot mid-price returns intervaled at 15 seconds. We begin with the mid-price 
series: 


Pt(i) 5 Pt( 2 ) 5 • • • Pt(m) 5 ^(i+i) ^(i) ^ seconds, 

which we transform online to obtain arithmetic returns: 

r(i), r(2),... r^rn-i ), r^j) = - Pt^j^ • 

The summary statistics of the arithmetic returns (in pips, 1 pip = 0.0001) 
are as follows (data set size = 71797): 


Statistic 

Value (Pips) 

Mean 

1.0449e-05 

Standard Deviation 

1.0852 

Skewness 

-0.0222 

Kurtosis 

16.5736 


Rftturn S#ri#s 


1 1 

- 

Pri« S#ri#s 

1 1 

_1_1_ 



Eig 9: An extract of the forex returns and price series for 2013-01-28 to 2013-01- 
31. Noteworthy features include the prevalence of return outliers and distinctive 
changes in return variance. Also apparent are periods of pronounced, but tem¬ 
porary trending behaviour corresponding to changes in the mean of the return 
distribution. 
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Accurately tracking the quantiles of this return series is a non-trivial check 
of our methods - the distribution of these returns is non-stationary and the fre¬ 
quency of outliers is high. The frequency of outliers is related to the fact that 
the distribution of the returns is heavy-tailed as evidenced by the high kurtosis. 
This will probe the dynamic quantile estimation performance in the setting of 
general non-stationarity as well as evaluating the robustness of the algorithms. 
Applications of online quantile estimation for financial price series include iden¬ 
tifying high frequency trading opportunities, real-time risk estimation (such as 
the calculation of real time Value at Risk, VaR) and outlier detection. 

Our test is as follows: we count the number of times the {i + l)th observation 
is smaller than the online estimates of the 0.5 (median), 0.9 and 0.99 quantiles 
obtained up to observation i. These counts are then normalised by the total 
number of observations. Ideally, the out-of-sample observation should be smaller 
than the median quantile with probability 0.5. Similarly, the out-of-sample ob¬ 
servation should be smaller than the 0.9 quantile with probability 0.9 and it 
should be smaller than the 0.99 quantile with probability 0.99. We compare the 
observed frequencies to these probabilities and report 95% percentile bootstrap 
confidence intervals for the observed frequencies. The bootstrap confidence in¬ 
tervals are created by calculating the observed frequencies for each day in the 
period in question (January 2013), resampling the resultant daily frequencies 
with replacement (1000 resamples) and providing the 0.025 and 0.975 quantiles 
of the resampled distribution. We also include the results for the EWSA algo¬ 
rithm for comparison. The parameters used for the algorithms are as follows: 
V = 6 for the GH and EWGH algorithms. This choice of N was motivated 
by the simulation results and computational efficiency considerations. Eor the 
EWGH algorithm, A = 0.05. This choice was motivated by an analysis of an 
initial sample (not contained within the January 2013 data set) and by the fact 
that we expect forex return quantiles to vary rapidly around news events and 
thus a choice of A bigger than 0.01 (the best choice in the simulation studies) 
appeared appropriate. The parameters for the EWSA algorithm are the same 
as in section 7. 


Algorithm 

p=0.5 

II 

o 

p=0.99 

GH 

0.469 [0.463,0.475] 

0.860 [0.849,0.873] 

0.974 [0.971,0.978] 

EWGH 

0.500 [0.499,0.502] 

0.894 [0.892,0.895] 

0.992 [0.991,0.992] 

EWSA 

0.501 [0.500,0.502] 

0.897 [0.895,0.899] 

0.976 [0.973,0.979] 


The EWGH algorithm performs well overall and is the only algorithm that 
does not underestimate the p=0.99 quantile. This provides evidence that not 
only does the EWGH algorithm perform well in a non-stationary environment, 
but also that it handles heavy-tailed distributions well. Note that we have per¬ 
formed the same test on other months to check the consistency of the results 
and the same behaviour emerges (omitted for brevity). We re-iterate that for 
the GH and EWGH algorithms, we can obtain online estimates of any quantile 
of interest. 
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9. Conclusion 

In this article we have defined a cumulative distribution function estimator based 
on Hermite series estimators which allows quantiles to be obtained numerically. 
For probability densities with support on the positive half-real line, we have 
proven asymptotic MSE consistency (pointwise consistency) as well as asymp¬ 
totic consistency based on a Cramer-von Mises like criterion for this estimator. 
We have also provided the associated rates. In addition, we have demonstrated 
that quantile estimates obtained from this estimator directly depend on the 
MISE of the Hermite series density estimator under certain conditions. While 
these results are novel and interesting in their own right, our particular appli¬ 
cation of interest is that of sequential quantile estimation. In this setting, the 
Hermite series based estimators are biased in general. They are still very useful 
in practice however. In particular, we have introduced algorithms - based on 
the Gauss-Hermite expansion - for online quantile estimation in the settings of 
static quantile estimation and dynamic quantile estimation. These algorithms 
have 0(1) time complexity for updating the distribution and quantile function 
estimates. 

In the static quantile estimation setting we have exploited the fact that Gauss- 
Hermite coefficients can be updated in a sequential manner. To treat dynamic 
quantile estimation, we have introduced a novel expansion with an exponentially 
weighted estimator for the Gauss-Hermite coefficients which we have termed 
the Exponentially Weighted Gauss-Hermite (EWGH) expansion. This expan¬ 
sion should allow the local behaviour of a non-stationary stream of data to be 
tracked. To make our analysis concrete, we have considered i.i.d data streams 
and independent, non-identically distributed data streams in our simulation 
studies and our theoretical analysis. The simulation studies revealed the Gauss- 
Hermite based algorithms to be competitive with a leading existing algorithm 
for online quantile estimation. In addition, a test on real forex data confirmed 
the effectiveness of the EWGH algorithm in a more general and realistic setting 
and provided evidence that our techniques are effective for heavy-tailed distri¬ 
butions. 

The particular usefulness of our algorithms is that they allow arbitrary quan¬ 
tiles to be estimated in an online manner. They do not require a particular set 
of quantiles to be specified upfront, which is a limitation of existing algorithms. 
In obtaining these novel algorithms, we have thus provided a solution to the 
problem of online distribution function and online quantile function estimation 
for both stationary and non-stationary data streams. Online estimates of these 
functions are in fact useful in a broader context in that any function of these 
quantities can be calculated in an online manner. These online estimates could 
also be used in online machine learning applications for example. 
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Appendix A: Standardising the observations 

It is reasonable to assume that in practice, the quality of the fit yielded by the 
truncated Gauss-Hermite expansion should be better if applied to standardised 
random variables (with mean equal to zero and standard deviation equal to 
one). For a random variable x with mean /i and standard deviation a we have 
the standardised random variable: 


X — 11 

X = -, 

a 

with probability density f{x) = cr/(crx-h/i). The associated truncated Gauss- 
Hermite expansion is: 


where 


N 

fix) = y^^akHk{x)Z{x), 

k=0 


tftk — 



Z{x)f{x)Hk{x)dx. 


Ideally, we would utilise: 


ak{li,(T) = ak- y - )Hk[ -), 

n a a 

1=1 

to estimate the coefficients. In practice however, we do not know the true 
values of /i and a and thus we must use estimates of these values, ft and a. 
In general, the effect of using these estimates is to bias the estimate of 
This can be seen by considering the Taylor expansion of d/c(/i, a) which implies 
E{dk{jxD) ^ E{ak{ti,cT)). 

Despite this bias, standardising using the estimated mean and standard de¬ 
viation improves the quality of the fit in many cases. In the sections below 
we provide online algorithms to estimate the mean and standard deviation in 
both the static and dynamic setting. These estimates can then be plugged into 
the standard Gauss-Hermite coefficients and the exponentially weighted Gauss- 
Hermite coefficients respectively. 
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A,l, Static Quantile Estimation 

The usual estimators of the mean and standard deviation can be calculated in 
an online way. The mean (ftp) can be updated with each new observation as: 


Al = ^1, 


fik = ^{{k - + Xk), k>2. 

The standard deviation {ap) can also be estimated in an online manner. To 
avoid numerical precision problems an algorithm such as that originating from 
Welford [30] can be applied. 


Ml = xi, 


M/e — M/c_i + 


^1=0, 

-M/e — i) 


k 


Sk = Sk-i + {xk - Mk-i) {xk — Mk), 


^k — 


Sk 

k — 1 


, k>2. 


A.2. Dynamic Quantile Estimation 

For the purposes of obtaining a local estimate of the mean and standard devia¬ 
tion we can utilise EWMA estimators: 

Al = ^1^ 

fik (1 1 T ^Xk") k ^ 2^ 

Vi = l, 

V/e = (1 — X)Vk-l + ^{Xk — ftkY' ^ k > 2^ 



0 < A < 1. 
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Appendix B: MISE of the Exponentially Weighted Gauss-Hermite 
Expansion 

BA, MISE of the Exponentially Weighted Gauss-Hermite 
Expansion for IID Data 

Proposition 3. Let the true probability density funetion f{x) G L 2 and 
^|X|3 < 00 . The MSB of the eoeffeients (16) have the following upper bound 
for a sample of n 1 observations in the i.i.d. ease: 


MSE{ak) < 


X 


[1-(1-AH+(1-A) 


\2n 


^/Bc{k + 1 ) 


_2-A 
where c is a eonstant. 

Proof The truncated EWGH expansion of f{x) is given by: 

N 




fN{x) = ^dkHk{x)Z{x), 


/c=0 


where the coefficient estimates are given by (16). In what follows we assume 
an i.i.d sequence of n + 1 observations has been drawn from f{x) i.e. x^ ^ f{x). 


The estimator (16) can be equivalently written as: 


n —1 

afe = A y](l - \y [akZ{y.n-j)Hk{y^n-j)] + (1 - A)" [Q!feZ(xo)i?jk(xo)]. 

3=0 

The expected value of this estimator, E {ap) = ap and thus the estimator is 
unbiased. 

The variance, Var(d/c) = E [{dp — n/c)^] of the estimator is: 


n—1 


Var(afc) = A^ ^(1 - A)2iVar ([afeZ(x„_,)fffc(x„_,)]) 
i=o 

+ (1 - A)2”Var ([afeZ(xo)i/fc(xo)]) 

A 


2-A 


[1-(1-A)^"]+(1-A) 


2n 


ax{k), 


(30) 


where cr|.(fc) = Var [akZ{x)Hk{x)] = E [Z{x)f [Hk{x)f -al- Note 

that we have exploited the independence of the observations x*. 
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Now, we can obtain an upper bound on using the properties of Hermite 

polynomials (following from (4) and (5), see [12]): 


cr^(fc) < 


V^c(fc + l)-l/2 

2^-n\ 


This yields the following bound on the MSE of ap (since ap is unbiased, the 
bound on the MSE of ap is equal to the bound on the variance): 


MSE(a/,) = 


< 


E |^(h/0 CLp^ j 

^ [1 - (1 - A) 2 "] + (1 - A) 2 " 


0Fc(fc+ l)-i/2 

2^-^k\ 


• (31) 


□ 

Theorem 8. Let the true probability density funetion f{x) G L 2 and E\X\i < 
00 . If r > 1 derivatives of f{x) exist and {x — -^Yf{x) ^ ^2 then the MISE 
for the EWGH expansion in the i.i.d. ease behaves as follows for N suffieiently 
large and n ^ 00 : 


MISEifN) =0{W/‘^) 


2 - A 


0{N-^). 


Proof If r > 1 derivatives of f{x) exist and {x — f{x) ^ L 2 then 
YYk=N+i ~ 0{N~^) (see [29],[12]). Combining this fact and the bound 

on MSE(a/c) (proposition 3) with the expression for the MISE (12) we obtain 
the following: 


N 


2^-^k\ 


MISE (/at) = ^ [(afe - afef] + ^ 

fc=0 k=N+l 


2^-^k\ 

EX 


al 


< 


A 


— [1 - (1 - a ) 2 "] + (1 - X 


2-A 

+ 0{N-^) 

Eor N sufficiently large and n x 00 we have: 

MISE(/^) =0{W/^) jE 


N 


k=0 


■0(A^-^). 


□ 
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B,2, MISE of the Exponentially Weighted Gauss-Hermite 
Expansion for Non-identically Distributed Data 

In this section we extend the results derived for i.i.d data to non-identically 
distributed, independent data. We consider the case where we observe s + 1 
observations from a probability distribution fi G L 2 followed by a further t 
observations from a second distribution /2 G L 2 i.e. we assume an independent 
sequence of 5 + 1 observations xq, ... 5 X 5 ^ fi{x) followed by an independent 
sequence of t observations, x^+i,... ,Xt+s ^ / 2 (^)- We consider this a funda¬ 
mental example of non-identically distributed data. We evaluate and bound the 
MISE of fN{x) compared to f 2 {x) at observation t s. We denote the true 
Gauss-Hermite coefficients of fi{x) as and the true Gauss-Hermite coeffi¬ 
cients of f 2 {x) as 

Proposition 4. The MSE of the eoeffieients (16) eompared to have the 
following upper bound after s^l independent observations from fi G L 2 followed 
by t independent observations from f 2 G L 2 provided E\X\^ < oc for both 
distributions: 


E 


< 


{ak-a^S) 


c^/n{k + 1 ) 


4(1 - A)"* + 


2 - A L 


1 - (1 - + (1 - 


Proof. The expected value of the exponentially weighted estimator for the 
Gauss-Hermite coefficients is: 


s-l-t-l 

E{ak) = X {1-XyE{[akZ{xs+t-j)Hk{^3+t-j)]) 

j=0 

+ (l-A)*+*E([afeZ(xo)fffc(xo)]) 

s+t-1 t-1 

= Aa^ ^ (1 - A)^' + Aaf - A)^' + (1 - A)*+‘ay 
= af+ (l-A)*[ay-af]. 


(32) 


Thus the squared bias of the estimator compared to the true Gauss-Hermite 
coefficient is: 


M 2 


E{ak)-af'>] =(l-A)2‘[ay-afy. 

Similarly, the variance Var(a/c) = E [{ap — E (d/c))^] of the estimator is: 
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s+t-1 t-1 

Vai{ak) = ^ 

j=t j=0 

+ (l_A)2(«+t)[(^W(fc)]2 

= [^(1 - A)^* + [l - (1 - A)2(«+*) 

+[ 4 \k)rj^[i-{i-xr], 

where = Var [akZ{x)Hk{x)] ,x ~ fi{x) and 

[cr^x\k)]'^ = Yax[akZ{x)Hk{x)] ,x ^ f 2 {x). 

(2) 

Thus the mean squared error of compared to ^ is: 


(33) 


E (^CLk - ^ (“*) “ ^ + Var(afe) 

= (l-A)2*[4^)-aff 

+ [<r^x\k)? [^(1 - A)2* + [l - (1 - A)2(*+‘) 

+ [<T^^\k)rj^[l-{l-Xr]. (34) 

The MSE therefore depends on the difference between the true Gauss-Hermite 
coefficients of fi{x) and f 2 {x) and the number of observations since the switch 
between the distributions (along with the value of A). We can bound the MSE 
above using the properties of Hermite polynomials: 



Theorem 9. Given 5 + 1 independent observations from fi G L 2 followed by t 
independent observations from f 2 ^ L 2 , if r > 1 derivatives of f 2 {x) exist and 
{x — f 2 {x) G 1/2 and E\X\3 < 00 for both distributions then the MISE for 

the EWGH expansion behaves as follows for N sujfieiently large: 


MISE if n) = 4(1 _ xft ^ [1 _ (1 _ A)2(«+*)] + (1 - A)2(*+*) 

2 — A L J 

+ 0{N-^). 
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Proof. If r derivatives of f{x) exist and {x — f{x) ^ ^2 then 

YYk=N+i ^ ~ 0{N~'^) (see [29],[12]). Combining this fact with the bound 

on the MSE in proposition 4 and the expression for the MISE (12), we obtain: 


MISE (/at) = 

+ 0{N-^). 


4(1 - A)"* + 


2-A L 


1 - (1 - + (1 - 


□ 
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